DIMACS Series in Discrete Mathematics 
and Theoretical Computer Science 



Minimizing Polynomial Functions 

Pablo A. Parrilo and Bernd Sturmfels 



Abstract. We compare algorithms for global optimization of polynomial 
functions in many variables. It is demonstrated that existing algebraic methods 
(Grobner bases, resultants, homotopy methods) are dramatically outperformed 
by a relaxation technique, due to N.Z. Shor and the first author, which involves 
sums of squares and semidefinite programming. This opens up the possibility 
of using semidefinite programming relaxations arising from the Positivstellen- 
satz for a wide range of computational problems in real algebraic geometry. 



1. Introduction 

This is an expository and experimental paper concerned with the following 
basic problem. Given a multivariate polynomial function / G K[a;i, . . . , Xn] which 
is bounded below on R", find the global minimum /* and a point p* attaining it: 

(1.1) r = fiP*) = min{/(p) : PGR"}. 

Exact algebraic algorithms for this task find all the critical points and then identi- 
fying the smallest value of / at any critical point. Such methods will be discussed 
in Section 2. The techniques include Grobner bases, resultants, eigenvalues of com- 
panion matrices | CL098| , and numerical homotopy methods [ Li97|, |Ver ] 



An entirely different approach was introduced by N.Z. Shor ([Sho87|, |SS97|) 



and further developed in the dissertation of the first author |ParOO]. The idea is 
to compute the largest real number A such that f{x) — A is a sum of squares in 
M.[xi, . . . ,Xn]- Clearly, A is a lower bound for the optimal value /*. We show in 
Section 3 that, when the degree of / is fixed, the lower bound A can be computed 



in polynomial time using semidefinite programming |VB96]. If A = /* holds then 
this is certified by semidefinite programming duality, and the certificate yields the 
optimal point p*. In our computational experiments, to be presented in Section 5, 
we found that X — f* almost always holds, and we solved problems up to n = 15. 

The objective of this article is to provide a bridge between mathematical pro- 
gramming and algebraic geometry, demonstrating that algorithms from the former 
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have the potential to play a major role in future algorithms in the latter. This will 
be underlined in Section 6, where we present open problems, and in Section 7 where 
we show that semidefinite programming in conjunction with the Positivstellensatz 
is applicable to a wide range of computational problems in real algebraic geometry. 

2. Computational Algebra 

In this section we discuss the following approach to our problem ( |l.l| ) . Form the 
partial derivatives of the given polynomial / and consider the ideal they generate: 

The zeros of the ideal / in complex n-space C" are the critical points of /. Their 
number (counting multiplicity) is the dimension over K of the residue ring: 

- dimRK[x]// = #Vc(/). 

We shall assume that /i is finite. (If /i = +oo then one can apply perturbation 
techniques to reduce to the case /i < +oo). For instance, if / is a dense polynomial 
of even degree 2d then it follows from Bezout's Theorem that /x = {2d — 1)". 
Consider the subset of real critical points: 

Vm(/) = {/'\p(^\..., C R". 

This set is usually much smaller than the set of all complex critical points, i.e., 
typically we have v <^ ^. If we know the set V]i(/), then our problem is solved. 

Lemma 2.1. The optimal value is attained at a critical point: 

(2.1) r - min{/(pW),/(p(2)),...,/(pM)} 

The three techniques to be described in this section all compute the set Vr(/) 
of real critical points. We will illustrate then for the following example: 

(2.2) Minimize f{x,y,z) — + y"^ + z"^ ~ Axyz + x + y + z 

The optimal value for this problem is /* = —2.112913882, and, disregarding sym- 
metry, there are three optimal points {x*,y*, z*) attaining this value: 

(0.988, -1.102, -1.102) , (-1.102, 0.988, -1.102) , (-1.102, -1.102, 0.988). 
2.1. Grobner bases and eigenvalues. We review the method of solving 



polynomial equations by means of Grobner bases and eigenvalues [CL098, §2.4]. 
We are free to choose an arbitrary term order -< on the polynomial ring M.[xi , . . ., Xn] ■ 
Let Q he a. Grobner basis for the critical ideal / with respect to -<. While computing 
Grobner bases is a time-consuming task in general, this is not an issue in this paper, 
since in all our examples the n given generators df /dxi already form a Grobner 
basis in the total degree order. In our example the Grobner basis is 

(2.3) g = {xi-yz + 1/4, ^-a;z + l/4, zi-j;y + l/4} 

A monomial x\^ ■ ■ ■ a;J^" is standard if it is not divisible by the leading term of any 
element in the Grobner basis g. The set B of standard monomials is an R-basis for 
the residue ring R[x]//. The standard monomials for ( |2.3| ) are: 

(2.4) B = {xVz''^ : z,j,A;gN, i,j,fc<2}, ^ = #(^) = 27. 
For any polynomial g G R[x] consider the R- linear endomorphism: 

Timesg : R[x]// R[x]//, h ^ g ■ h. 
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This endomorphism is represented in the basis B hy a real fi x /i-matrix Tg. The 
entry of Tg with row index x" G and column index G S is the coefficient of 
x" in the normal form of x" • g(x) with respect to Q. 

Proposition 2.2. The optimal value f* is the smallest real eigenvalue of the 
matrix Ty. Any eigenvector of T f with eigenvalue f* defines an optimal point 
p* = (pI, . . . ,p* ) by the eigenvector identities T^^ ■ v = pi ■ v for i = 1, . . . , rt. 



Proof. This follows from Lemma 2.1 and Theorem (4.5) in the book of Cox- 
Little-O'Shea |CL098t ; see also [ |CL098| ' Exercise 17, page 62]. □ 



The resulting algorithm is to compute symbolically the matrices T f and T^^ . for 
i = 1, . . . , n, then compute numerically its eigenvalues (and matching eigenvectors) 
of Tf, and finally determine /* and p* as in the proposition. 

In our example the matrix T f has format 27x27 with rows and columns indexed 



by (2.4). Of its 729 entries only 178 are nonzero. For instance, the column indexed 



by xyz has four nonzero entries, namely, the coefhcients of 

normalformg(a;?;z • /) = —x^yz + —xy'^z + —xyz'^ — x^y^z^. 

The matrix Tj has maximal rank 27. Of its eigenvalues only three are real: 

-0.8692394998, -0.8702981639, -2.112913879 
The three real eigenvalues have multiplicity 3,1, and 3 respectively. 



2.2. Resultants and discriminants. One algebraic method for solving poly- 
nomial equations is to use resultants. Closely related to resultants are discriminants. 
They express the condition on a hypersurface to have a singularity, by means of a 
polynomial in the coefficients its defining equation. Let t be a new indeterminate 
and form the discriminant of the polynomial f{x) — t with respect to xi, . . . , a;„ 



8{t) := A4/(a;i,...,2;„) -i) 



Here A^, refers to the A- discriminant^ defined in | GKZ9^, Chapter 9], where A is 



the support of / together with the origin. From [ |GKZ9^ , §10.1.H] we conclude 



that the discriminant bif) equals the characteristic polynomial of the matrix Ty. 
Corollary 2.3. The optimal value f* is the smallest real root of S{t). 

In our example, 5{t) is 256t^ — 512t^ — 96t + 473 times the third power of 

65536i^ + 393216^^ + 1056768i'' + 1011712t3 - 421376i2 - 437152i + 166419. 

The optimal value /* = —2.11... is a root of this sextic. This sextic has Galois 
group 5*6, so /* cannot be expressed in radicals over the rationals. 

The suggested algorithm is to compute S{t), and minimal polynomials for the 
coordinates x* of the optimal point, by elimination of variables using matrix for- 



mulas for resultants and discriminants |GKZ94, Chapter 13]. The subsequent 



numerical computation is to find the roots of a univariate polynomial. 



4 PABLO A. PARRILO AND BERND STURMFELS 



2d \ n 


3 


5 


7 


9 


11 


13 


15 


2 


1 


1 


1 


1 


1 


1 


1 


4 


27 


243 


2187 


19683 


177147 


1594323 




6 


125 


3125 


78125 


1953125 


48828125 






8 


343 


16807 


823543 


40353607 


1977326743 






10 


729 


59049 


4782969 


387420489 








12 


1331 


161051 


19487171 











Table 1. The Bezout number /i = {2d — 1)" for the critical equations. 



2.3. Homotopy methods. The critical equations form a square system: n 
equations in n variables having finitely many roots. Such a system is well-suited 
for numerical homotopy continuation methods. For an introduction to this subject 



see the papers of Li [Li97| and Verschelde [Ver|. The basic idea is to introduce 
a deformation parameter t into the given system. For instance, we might replace 
( ^.3| ) by the following system which depends on a complex parameter r: 

(2.5) Hr : - T ■ yz + 1/4 ^ - T ■ xz + 1/4 = - T ■ xy + 1/4: ^ 0. 

The solutions (a;(T), ?/(t), z(t)) are algebraic functions of r. Our goal is to find 
them for t ~ 1. It is easy to find the solutions for t = 0: 

(a;(0),2;(0),z(0)) = ( 4-1/^ ■ r;i, 4-1/^ • r,2, 4"'/' • r/3 ) , v! = I- 

Homotopy methods trace the full set of solutions from r = to r = 1 along a 
suitable path in the complex r-plane. We determine /* by evaluating the objective 
function f{x, y, z) at all the real solutions for t = 1. 

Homotopy methods are frequently set up so that the system at t = breaks 
up into several systems, each of which consists of binomials. If the input polyno- 
mials are sparse, then these are the polyhedral homotopies which take the Newton 
polytopes of the given equations into consideration. In the sparse case, the number 
fi will be the mixed volume of the Newton polytopes. For an introduction to these 



polyhedral methods see |CL098, Chapter 7] and the references given there. 



2.4. How large is the Bezout number ? A common feature of all three 
algebraic algorithms in this section is that their running time is controlled by the 
number /i of complex critical points. In the eigenvalue method we must perform 
linear algebra on matrices of size /i x /i, in the discriminant method we must find 
and solve a univariate polynomial of degree and in the homotopy method, we are 
forced to trace /i paths from r = to r = 1. Each of these three methods becomes 
infeasible if the number /i is too big; for instance, /z > 10, 000 might be too big. 

Suppose that the given polynomial / in K[xi, . . . , Xn] has even degree 2d and 
is dense. This will be the case in the family of examples studied in Section 5. Then 
H coincides with the Bezout number {2d — 1)". Some small values for the Bezout 
number are listed in Table |^. Most entries in this table are bigger than 10, 000. We 
are led to believe that the algebraic methods will be infeasible for quartics if n > 8. 

Each entry in the first row of Table |^ is a one. This means we can minimize 
quadratic polynomial functions by solving a system of linear equations (in polyno- 
mial time) . The punchline of this paper is to reduce our problem to a semidefinite 
programming problem which can also be solved in polynomial time for fixed d. 



MINIMIZING POLYNOMIAL FUNCTIONS 



5 



3. Sums of Squares and Semidefinite Programming 



We present the method in troduced by N.Z. Shor (| Sho87 |, | SS97 ), and further 
extended by the first author |ParOO|, for minimizing polynomial functions. This 
method is a relaxation: it always produces a lower bound for the value of /*. 
However, as we shall see in Section 5, this bound very frequently agrees with /*. 

We may assume that the given polynomial f{xi,...,Xn) has even degree 2d. 
Let X denote the column vector whose entries are all the monomials in xi, . . . , x„ 
of degree at most d. The length of the vector X equals the binomial coefficient 

Let Cf denote the set of all real symmetric N x A'^-matrices A such that /(x) = 
X^ -A-X. This is an affine subspace in the space of real symmetric N x A^-matrices. 
Assume that the constant monomial 1 is the first entry of X. Let En denote the 
matrix unit whose only nonzero entry is a one in the upper left corner. 

Lemma 3.1. For any real number A, the following two are equivalent: 

• The polynomial /(x) — X is a sum of squares in R[x]. 

• There is a matrix A £ Cf such that ^ — A ■ En is positive semidefinite, 
that is, all eigenvalues of this symmetric matrix are non-negative reals. 



Proof. The matrix A — X ■ En is positive semidefinite if and only if there 
exists a real Cholesky factorization A ~ X ■ En — ■ B. li this holds then 

/(x)-A ^ X'^-A-X-X = X'^ ■{A~X-En)-X = X'^B^B-X ^ {BX)^-{BX) 

is a sum of squares, and every sum of squares representation arises in this way. □ 

We write /*°* for the largest real number A for which the two equivalent condi- 
tions are satisfied. We always have /* > f"''. This inequality may be strict. It is 
even possible that = —oo. An example of this form is Motzkin's polynomial 

(3.1) m{x,y) = x'^y'^ + x^y'^ — ix^y"^ . 

It satisfies m{x, y) > —1, but m{x, y) — A is not a sum of squares for any A € R. We 
refer to |RezOO| for an excellent survey of the problem of representing a polynomial 
as a sum of squares, and the important role played by Motzkin's example. 

Sums of squares are crucial for us because of the following complexity result. 

Theorem 3.2. Fix deg(/) = 2d and let the number of variables n vary. Then 
there exists a polynomial-time algorithm, based on semidefinite programming, for 
computing Z**"** from f . The same statement holds if n is fixed and d varies. 

Semidefinite programming (SDP) is the study of optimization problems over the 
cone of all positive semidefinite matrices. This branch of optimization has received 
a lot of attention in recent years, both for its theoretical elegance and its practical 
applications. Semidefinite programs can be solved in polynomial time, using interior 
point methods; see [ [NN94| , [[WSVOO| , [|VB96| . This complexity resuh (together 
with Lemma 3.1) implies Theorem 3.2 because the quantity N 



'ri-\-d\ 



fn-\-d\ 



grows polynomially if either n or d is fixed. This result appears in |ParO0| ]. 

Available implementations of interior-point methods for semidefinite program- 
ming perform extremely well in practice, say, for problems involving matrices up 
to 500 rows and columns (provided there are not too many variables). This allows 
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2d \ n 


3 


5 


7 


9 


11 


13 


15 


2 


4 


6 


8 


10 


12 


14 


16 


4 


10 


21 


36 


55 


78 


105 


136 


6 


20 


56 


120 


220 


364 


560 


816 


8 


35 


126 


330 


715 


1365 


2380 


3876 


10 


56 


252 


792 


2002 


4368 


8568 


15504 


12 


84 


462 


1716 


5005 


12376 


27132 


54264 



Table 2. The matrix size iV = ("+'') for the semidefinite programs. 



for the efficient computation of f^"" , and as we shall see in Section 4, SDP duality 
furnishes a polynomial-time test to check whether /* = /*°* and for computing 
the optimal point p* in the affirmative case. A comparison of Tables |l| and ^ sug- 
gests that SDP has the potential to compute much larger instances than algebraic 
methods. Section 5 will show that this is indeed the case. 



Our example (2.2) has parameters c? = 2,n = 3. The affine space Cf consists 
of all 10 X 10-matrices ^(A, c) with A = and q G M arbitrary in the family 



-A 


1/2 


1/2 


1/2 


Cl 


-C2 


C3 


-C4 


-cs 


C6 


1/2 


-2ci 


C2 


C4 





C7 


-C8 


-C9 


ClO 


Cl2 


1/2 


C2 


-2C3 


C5 


-C7 


C8 





Cl3 


Cl4 


Cl5 


1/2 


C4 


C5 


-2C6 


C9 


—ClO — Cl3 — 2 


— Cl4 


-C12 


-Cl5 





Cl 





-C7 


C9 


1 





Cl6 





Cl7 


Cl8 


-C2 


C7 


C8 


-ClO - Ci3 - 2 





-2C16 





-Ci7 


— Ci9 


-Cll 


C3 


-C8 





-Cl4 


Cl6 





1 


Cl9 





C20 


-C4 


-C9 


Cl3 


-Ci2 





-Cl7 


Cl9 


-2C18 


Cll 





-C5 


ClO 


Ci4 


-CIS 


Cl7 


— Ci9 





Cll 


-2C20 





C6 


Cl2 


Cl5 





Cl8 


-Cll 


C20 








1 



The rows and columns of this matrix are indexed by the entries of the vector 

X = \^ 1 X y z xy xz yz ^ . 

We invite the reader to check the identity 

X'^-A{\c)-X = f{x,y,z) - X for all Cl, . . . , C20 e M. 

The lower bound is the largest real number A such that, for some choice of 
Cl, . . . , C20 G K, the matrix ^(c. A) has all eigenvalues nonnegative. We find that 

= /* = -2.112913882, 

and the optimal matrix (to five digits) is given by: 



2.1129 


0.5000 


0.5000 


0.5000 


-0.4678 


-0.0922 


-0.4678 


-0.0922 


-0.0922 


-0.4678 


0.5000 


0.9356 


0.0922 


0.0922 


-0.0000 


0.0892 


-0.0892 


0.0892 


-0.6666 


-0.0892 


0.5000 


0.0922 


0.9357 


0.0922 


-0.0892 


0.0892 


0.0000 


-0.6667 


0.0892 


-0.0892 


0.5000 


0.0922 


0.0922 


0.9356 


-0.0892 


-0.6666 


-0.0892 


0.0892 


0.0892 


0.0000 


-0.4678 


-0.0000 


-0.0892 


-0.0892 


1.0000 


-0.0000 


-0.3180 


0.0000 


0.0554 


-0.3181 


-0.0922 


0.0892 


0.0892 


-0.6666 


-0.0000 


0.6360 


0.0000 


-0.0554 


-0.0554 


0.0554 


-0.4678 


-0.0892 


0.0000 


-0.0892 


-0.3180 


0.0000 


1.0000 


0.0554 


-0.0000 


-0.3180 


-0.0922 


0.0892 


-0.6667 


0.0892 


0.0000 


-0.0554 


0.0554 


0.6361 


-0.0554 


0.0000 


-0.0922 


-0.6666 


0.0892 


0.0892 


0.0554 


-0.0554 


-0.0000 


-0.0554 


0.6360 


0.0000 


-0.4678 


-0.0892 


-0.0892 


0.0000 


-0.3181 


0.0554 


-0.3180 


0.0000 


0.0000 


1.0000 
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This matrix is positive semidefinite. By computing a factorization ■ B as in 



the proof of Lemma 3.1, we can express / — /*°* as a sum of squares. In the next 
section we show how to recover the points at which the optimal value is achieved. 
Note that the number 20 of free parameters is the case "n = 3, c? = 2" of: 

Remark 3.3. The dimension of Cf equals the number of linearly independent 
quadratic relations among the monomials of degree < d in n variables. It equals 

2 



dim Cf = - 
' 2 



n + d\ /n + d 
d ) d 



n + 2d 
2d 



The codimension ( with respect to the space of symmetric matrices ) is equal to 

+ 2d\ 



codim C 



f 



2d 



4. Semidefinite Programming Duality 

In Section 3 we demonstrated that computing /*°*' is equivalent to minimizing 
a linear functional over the intersection of the affine space Cf with the cone of 
positive semidefinite TV x TV-matrices. In our discussion we have represented the 
space Cf hy a spanning set of matrices. For numerical efhciency reasons it is usually 
preferable to represent Cf hy its defining equations (unless n and d are very small) . 

Duality is a crucial feature of semidefinite programming. It plays an important 
role in designing the most efficient interior-point algorithms. In what follows we 
review the textbook formulation of SDP duality, in terms of matrices. Thereafter 
we present a reformulation in algebraic geometry language, and we then explain 
how to test the condition f"" = f* and how to recover the optimal point p*. 

4.1. Matrix Formulation. Let 5^ denote the real vector space of symmetric 
N X iV-matrices, with the inner product A • B := trace (AB), and the Lowner 
partial order given hy A ^ B if B — A is positive semidefinite. Recall that A £ 
is positive semidefinite if x'^Ax > 0, for all x E M.^ . This condition is equivalent to 
nonnegativity of all eigenvalues of A, and to nonnegativity of all principal minors. 



The general SDP problem (| VB96|, |WSVOO|) can be expressed in the form: 



minimize F • X 
(4.1) subject to gx = b 

X h 

where X,F € , b € M.'^ , and Q : — > is a linear operator. This is usually 
called the primal form, in analogy with the linear programming (LP) case. 

Notice that (|4.l[) is a convex optimization problem, since the objective function 
is linear and the feasible set is convex. There is an associated dual problem: 

, , maximize b'^y 

^ ' subject to F ~g*yyO 

where y G M*^ and Q* : R^^ — > iS^ is the operator adjoint to Q. Any feasible 
solution of the dual problem is a lower bound of the optimal value of the primal: 

F»X-b^y = FmX-y'^gX = {F -g*y)»X >0. 

The last inequality holds since the inner product of two positive semidefinite ma- 
trices is nonnegative. The converse statement (primal feasible solutions give upper 
bounds on the optimal dual value) is obviously also true. The inequality above 
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is called weak duality. Under certain conditions (notably, the existence of strictly 
feasible solutions), strong duality also holds: the optimal values of the primal and 
the dual problems coincide. If strong duality holds, then at optimality the matrix 
X ■ {F - g*y) is zero, since A,B ^0, trace(AS) = implies AB = 0. This can be 
interpreted as a generalization of the usual complementary slackness LP conditions. 



Practical implementations of SDP (we will use SeDuMi [ 3tu99|) simultaneously 



compute both the optimal matrix X for (|4.lD and the optimal vector y for (4.2) 



4.2. Polynomial Formulation. We set m = — 1 and we identify 5^ 
with the real vector space R[x]2 = M[a;o, xi, . . . , Xmh of quadratic forms in m + 1 
variables. The vector space dual to 5^ is now denoted M[9]2 = K[(?o, di, . . . , dmh- 
The dual pairing is given by differentiation and is denoted •. For any / G M.[d]2 
and any real vector p = {po, . . . ,Pm) G K.™^^, the following familiar identity holds: 

_^ m 

(4.3) f{do,...,dm)»-^(^PiXif = f{po,...,pm). 

i=0 

We consider the general quadratic programming problem: 

(4.4) Minimize f{p) subject to go{p) — 1 and gi{p) = ■ ■ ■ = gr{p) = 0, 

where f,go,...,gr £ M[9]2 are given and we are looking for an optimal point p G 
jgm+i^ This problem can be relaxed to the following primal SDP: 

Minimize f{d) • q{x) subject to q{x) >z 

and go{d) • ~ 1 and g\{d) • q{x) = ■ ■ ■ = gr{d) • q{x) = 0. 

The inequality q{x) ^ means that q is non-negative on i.e., q is in the 



positive semidefinite cone in M[a;]2. In view of ( |4.3| ), the optimal value of (4.4) is 
greater than or equal to the optimal value of the primal SDP, and equality holds if 
and only if there is an optimal solution of the form q{x) — ^(^^^QPiXi)'^ ■ 

Every semidefinite progr amming problem comes with a dual problem, as in the 
previous subsection; see also [ VB96| . In our case the dual SDP takes the form: 



Maximize the first coordinate A of the vectors {X, ^i, . . . , fir) G K''+^ 

subject to the conditions f{d) + J2l=i Mi ' 9i{9) — ^ ■ go{d) h 

Assuming the existence of a strictly feasible primal solution, the maximum value in 
the dual SDP is always equal to the minimum value in the primal SDP. Under 
this regularity assumption, which is easy to satisfy in our application, we conclude: 

Proposition 4.1. // the primal SDP has an optimal solution of the form 



q{x) = ^(^"^QPiXi)'^ then the vector {po, . . . ,Pm) is an optimal solution for (4-4) 



4.3. Minimizing Quadratic Functions over Toric Varieties. A toric va- 
riety is an algebraic variety, in afhne space or projective space, which has a para- 
metric representation by monomials. Equivalently, a toric variety is an irreducible 
variety which is cut out by binomial equations, that is, differences of monomials. 
Here we will be interested in those projective toric varieties which are defined by 
quadratic binomials. This class includes many examples from classical algebraic 



geometry, such as Veronese and Segre varieties. See [Stu95 for an introduction. 

Let X be a toric variety in projective m-space whose defining prime ideal is 
generated by quadratic binomials gi, . . . ,gr in R[do, . . . , dm]- Each generator has 
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the form didj — dkdi for some i,j,k,l G {0, 1, ... , to}. We set go{d) — Oq. Then 
the equation go(<9) = 1 on X defines an affine toric variety X, such that X is the 
projective closure of X. Every quadratic polynomial function on the affine variety 
X is represented by a quadratic form / e R[d]2 as above. This representation is 
unique modulo the M-linear span of 51 , . . . , g^- Our problem ( 4.4 ) is hence equivalent 
to minimizing a quadratic function over an affine toric variety defined by quadrics: 



(4.5) Minimize f{p) subject to p ^ X 

The optimal value of the dual SDP relaxation in Subsection 4.2 is the largest 
real number A such that / — A is a sum of squares in the coordinate ring of X. 



Let us now return to our original problem (lA) where the given polynomial 
is dense of degree 2d in n variables. Here X is the Veronese variety in projective 
A^-dimensional space which is parameterized by all monomials of degree at most 



d. (If the polynomial in (1.1) is sparse then another toric variety can be used.) 



Writing our given polynomial as a quadratic form in homogeneous coordinates on 



X, our minimization problem (1.1) is precisely the quadratic toric problem (4.5). 

We solve (^|^) by simultaneously solving the primal and dual SDP relaxation in 
Subsection 4.2. If the optimal value A of the dual SDP agrees with the true minimum 
of / over X then the primal SDP has an optimal solution q{x) = ^(X]i!loP*^«)^ 
which exhibits an optimal point {po, . . . ,Pm) G X at which / is minimized. 

In our running example, we have m — 9 and r — 20, and X is the quadratic 
Veronese three-fold in projective 9-space which is given parametrically as 

{xo : Xi : ■ ■ ■ : Xg) ~ {l : r : s : t : r^ : rs : : rt : st : t^^ . 

It is cut out by twenty quadratic binomials such as xpa^s — xiX2- These binomials 
correspond to the parameters Ci in the 10 x 10-matrix yl(A,c) in Section 2. 



5. Experimental Results 

We now present our computational experience with Shor's relaxation for global 
minimization of polynomial functions. As mentioned earlier, the computational 
advantages of our method are based on the following three independent facts: 

• The dimension N of the matrix required in the sum of squares formu- 
lation is much smaller than the Bezout number /i, since it only scales 
polynomially with the number of variables. See Tables [l] and || above. 

• Semidefinite programming provides an efficient algorithm for deciding 
whether a polynomial is a sum of squares, and to find such representations 
for polynomials whose coefficients may depend linearly on parameters. 

• The lower bound f"" very often coincides with the exact solution /* of 
our problem (^]|), at least for the class of problems analyzed here. 

The experimental results in this section strongly support the validity of these facts. 

5.1. The test problems. For our computations, we fix a positive integer K, 
and we sample from the following family of polynomials of degree 2d in n variables: 

(5.1) f{xi,...,Xn) = xf + xf + ---+xl^ + g{xi,...,Xn) 
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where g G Z[a;i, . . . is a random polynomial of total degree < 2d ~ \ whose 
("^rf~^) coefficients are independently and uniformly distributed among integers 
between —K and K. Thus our family depends on three parameters: n, d and K. 
This family has been selected to ensure three important properties: 

Boundedness: The highest order terms xf'^ ensure that / is bounded be- 
low, and that the minimum value /* is achieved at some point p* G M". 
Efficient basis computation: When solving polynomial systems, the cal- 
culation of a Grobner basis is a time-consuming task. The structure of 



the polynomial (5.1) allows us to bypass this expensive step, since the set 
of n scaled partial derivatives a;^'*"^ + -h ' dg /dxj is already a Grobner 



basis with respect to total degree; cf. [CL097, §2.9, Proposition 4]. 
Simplicity: A main reason for this choice of model is its simplicity. While 
more sophisticated choices have other desirable mathematical properties 
(such as invariance under certain transformations) , we preferred to analyze 
here, as a first step, a relatively easy to describe set of instances. 



An important question is if the structure of the polynomials (5T) is somehow 
"biased" towards the application of sum of squares methods. This is a relevant 
issue, since the performance of algorithms on "random instances" sometimes pro- 
vides more information on the problem family, rather than on the algorithm itself. 
Concerning this question, we limit ourselves to notice that, for K sufficiently large. 



the family (5T) does include polynomials / with /*°* < /*. A simple example is 
f{x, y) = + + 2700 m(x, y), where ■m{x, y) is the Motzkin polynomial 

The polynomials in our family have global minima that generally have large 



negative values, of the order of —K^"^. This leads to ill-conditioning of the sym- 



metric matrices described in Lemma 3.1, and hence to numerical problems for the 



interior-point algorithm. Our remedy is a simple homogeneous scaling of the form 

fs{xi,...,Xn) = a^^"^ ■ f{axi, . . . ,axn), for some a > 0. 

Obviously, this does not affect the properties of being a sum of squares, or whether 
/* = f^"^. However, as is generally the rule in numerical optimization, this scaling 
step greatly affects both the speed and the accuracy of the SDP solution. 

5.2. Algorithms and software. Most of the test examples were run on a 
Pentium III 733Mhz with 256 MB, running Linux version 2.2.16-3, and using MAT- 
LAB version 5.3. Because of physical memory limitations, our largest examples 
(quartics in fifteen variables), were run on a Pentium III 650Mhz with 320 MB, 
under Windows 2000. The semidefinite programs were solved using the SDP solver 



SeDuMi Stu99 |, written by Jos Sturm. It is currently one of the most efficient 
codes available, at least for the restricted class of problems relevant here. SeDuMi 
can be run from within MATLAB, and implements a self-dual embedding technique. 
The default parameters are used, and the solutions computed are typically exact 
to machine precision (SeDuMi provides an estimate of the quality of the solution). 
The MATLAB Optimization toolbox was used for the implementation of a 



local search approach, to be described in Section 5.3. For the numerical homotopy 



method, we used the software PHCpack |Ver99], written by Jan Verschelde. The 



computation of the sparse matrix Tj was done using Macaulay 2 [|GS|, and its 
eigenvalues were numerically computed using MATLAB. 

We do not make strong claims about the efficiency of our implementations: 
while reasonable fast, for large scale problems considerable speedups are possible 
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2d \ n 


3 


5 


7 9 11 13 15 


4 


0.67 


28.9 


526 - - - - 


6 


12.3 


2643 




8 


70.6 






10 


508 







Table 3. Running time (in seconds) for the homotopy method. 



at the expense of customized algorithms. Nevertheless, we beUeve that the issues 
raised regarding the apphcabihty of algebra-based techniques to problems with large 
Bezout number remain valid, independently of the particular software employed. 

5.3. Standard local optimization. An alternative approach to the problem 
is given by traditional (nonconvex) numerical optimization. There exist many vari- 
ations, but arguably the most successful methods for relatively small problems such 
as the present ones are based on local gradient and Hessian information. Typical 
algorithms in this class employ an iterative scheme, combining the Newton search 



direction in combination with a line search |NW99|. These methods are reasonably 
fast in converging to a local minimum. For the larger problems in our family, they 
usually converge to a stationary point within 10 seconds. However, they often end 
up in the wrong solution, unless a very accurate starting point is given. 

The drawbacks of local optimization methods are well-known: lacking convex- 
ity, there are no guarantees of global (or even local) optimality. Worse, even if 
in the course of the optimization we actually reach the global minimum, there is 
usually no computationally feasible way of verifying optimality. 

Nevertheless, local optimization is an important tool for polynomial problems, 
as is the use of homotopy methods to trace the optimal value under small changes 
in the input data. It would interesting to investigate how these local numerical 
techniques can be best combined with the computations to be described next. 

5.4. Experimental results using computational algebra. In Table || we 
present typical running times for the homotopy based approach, described in Sec- 



tion 2.3. These were obtained running PHCpack in "black-box" mode (phc -b), 
that requires no user-specified parameters. The software traces all solutions (not 
necessarily real), its number being equal to the Bezout number. Comparing with 
Table |^, we can notice the adverse effect of large Bezout numbers in the practical 
performance of the algorithm, in spite of Verscheld e's i mpressive implementation. 



For the eigenvalue approach outlined in Section 2T , we compute the matrix T j 
using a straightforward implementation in Macaulay 2: the endomorphism Times/ 
is constructed, and applied to the elements of the monomial basis B. The resulting 
matrix, in a sparse floating point representation, is sent to a file for further process- 
ing. We found that the construction of the matrix Tf takes a surprisingly long time, 
for instance, it took Macaulay 2 over 10 minutes to produce the 125 x 125-matrix for 
2d = 6, n = 3. The eigenvalue problem itself is solved using MATLAB; it exploits 
the sparsity of the matrix, and runs reasonably fast. However, it appears that even 
a more efficient implementation of this method will not be able to compete with 
the timings in Table |^, let alone the timings in Table ||. 

After several discouraging attempts for small examples, we did not pursue a 
full implementation for the resultant-based methods sketched in Section 2.2. 
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2d \ n 


3 


5 


7 


9 


11 


13 


15 


4 


2000 


2000 


2000 


200 


20 


20 


2 


6 


2000 


200 


20 










8 


2000 


20 












10 


2000 















Table 4. Number of random instances in each category {K = 100, 1000, 10000). 



2d \ n 


3 


5 


7 


9 


11 


13 


15 


4 


0.2 


0.5 


4.4 


52 


361 


1994 


27400* 


6 


0.3 


21.2 


1046 










8 


1.2 


669 












10 


6.6 















Table 5. Running time (in seconds) for the semidefinite pro- 
grams. The marked instance was solved on a different machine. 



5.5. Experimental results using semidefinite programming. We ran 

several instances of polynomials in the family described above, for values of K 
equal to 100, 1000, and 10000. In Table | the typical running times for the semi- 
definite programming based approach on a single instance are presented. These are 
fairly constant across instances, and no special structure is exploited (besides what 
SeDuMi does internally). 

The number of random instances for each combination of the parameters is 
shown in Table ^. These values were chosen in order to keep the total computation 
time for a given category in the order of a few hours. 

Regarding the accuracy of the relaxation, in all cases tested the condition 
jsos _ J* satisfied. As explained in the previous section, this can be nu- 
merically verified by checking if the solution of the corresponding SDP has rank 
one, from which a candidate global minimizer is obtained. Evaluating the poly- 
nomial at this point provides an upper bound on the optimal value, that can be 
compared with the lower bound f^°^. In all our instances, the difference between 
these two quantities was extremely small, and within the range of numerical error. 

As an additional check, when we used different methods for solving the same in- 
stance, we have verified the solutions against each other. As expected, the solutions 
were numerically close, in many cases up to machine precision. 

In particular, it is noted that the approach can handle in a reasonable time 
(less than 35 min.) the case of a quartic polynomial in thirteen variables. Our 
largest examples have the same degree {2d = 4) and fifteen variables, correspond to 
an SDP with a matrix of dimensions 136 x 136 with 3876 auxiliary variables, and 
can be solved in a few hours. A quick glance at the corresponding Bezout number 
in Table Q makes clear the advantages of the presented approach. 

6. What Next ? 

We have demonstrated that the sums of squares relaxation is a powerful and 
practical technique in polynomial optimization. There are many open questions, 
both algorithmic and mathematical, which are raised by our experimental results. 
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One obvious question is h ow o ften does it occur that f^°^ = /* ? This can be stud- 
ied for our simple model (5.1), or, perhaps better, for various natural probability 
measures on the space of polynomials of bounded degree. This question is closely 
related to understanding the inclusion of the convex cone of forms that are sums 
of squares inside the cone of positive semidefinite forms. For the three-dimensional 
family of symmetric sextics, this problem was studied in detail by Choi, Lam and 
Reznick [CLR87]. Their work is an inspiration, but it also provides a warning as 
to how difficult the general case will be, even for ternary sextics without symmetry. 
We hope to pursue some of the following directions of inquiry in the near future. 

6.1. Sparseness and Symmetry. Most polynomial systems one encounters 
are sparse in the sense that there are only few monomials with nonzero coefficients. 



Methods involving Newton polytopes, such as sparse resultants [GKZ94] and poly- 
hedral homotopies |Ver|, are designed to deal with such problems. Symmetry with 
respect to finite matrix groups is another f eature o f many polynomial problems 
arising in practise. The book of Gatermann [ GatOO | is an excellent first reference. 

We wish to adapt our semidefinite programming approach to input polynomials 
/ w hich are sparse or symmetric or both. For instance, our polynomial example 
(2_^) is both sparse and invariant under permutation of the variables x, y, z. Both 
Newton polytope techniques and representation theory can be used to reduce the 
size of the matrices and the number of free parameters in the semi-definite programs. 



6.2. Higher degree relaxations. If we are unlucky, then the output pro- 
duced by SeDuMi will not satisfy the hypothesis of Proposition 4.1, and we conclude 
that the bound /*°* is probably strictly smaller than the optimal solution /*. In 
that event we redo our computation in higher degree, now with a larger SDP. The 
key idea is that even though /(x) — A may not be a sum of squares, if there exists a 
positive polynomial g{x) such that g{x) ■ {f{x) — A) is a sum of squares, then A < /*. 
The choice of g can be either made a priori (for instance, g = X^iLi ^?)' '-'^ ^ 



result of an optimization step (see [ParOOj for details). The Positivstellensatz (see 
Section 7) ensures that /* will be found if the degree of g is large enough. 

6.3. Solving polynomial equations. A natural application of Shor's relax- 
ation, hinted at in |Sho87], is solving polynomial systems 171(2;) = . ■ . = gr{x) = 0. 
The polynomial f{x) := j2l=i9iix) satisfies /* > /"°" > 0, and /* = holds if 
and only if the system has a real root. Clearly, f"" > is a sufficient condition 
for the nonexistence of real roots. An important open problem, essentially raised 
in |Sho87|, is to characterize inconsistent systems {gi, . . . ,gr} with /^°* = 0. On 
the other hand, if /* = f"'' = holds then it is possible, at least in principle, to 
obtain a numerical approximation of real roots using SDP. However, for a robust im- 
plementation, perturbation arguments are required and some important numerical 
issues arise, so the perspectives for practical applications are still unclear. 



6.4. Minimizing polynomials over polytopes. Consider a compact set 

P = {xeW : iiix)>0,...,esix)>0}, 

where £i is a linear form plus a constant, say, P is a polytope with s facets. Handel- 
man's Theorem |Han88| states that every polynomial which is strictly positive on 
P can be expressed as a positive linear combination of products £i{xy^ ■ ■ ■ isixY" ■ 
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Suppose we wish to minimize a given polynomial function f{x) over P. For D Cz N 
we define the D-th Handelman bound f'^^^ to be the largest A G M such that 

/(a;)-A = ^ c^^...^^ ■ ^l{xy^ ■ ■ ■ ^s{xY' for some c^.-.j^, > 0. 

■iiH H3<D 

Handelman's Theorem states that the increasing sequence Z^^-*, /^^^^\ f'^^^'^\... 
converges to the minimum of / over P. Each bound /(^^ can be computed using 
linear programming only. It would be interesting to study the quality of these 
bounds, and the running time of these linear programs, and to see how things 
improve as we augment the approach with semidefinite programming techniques. 

6.5. Which semialgebraic sets are semidefinite ? The feasible set of an 
SDP can be expressed by a linear matrix inequality, as in the dual formulation in 
Section 4.2. It would be interesting to study these feasible sets using techniques 
from real algebraic geometry, and to identify characteristic features of these sets. 
Here is a very concrete problem whose solution, to the best of our knowledge, is 
not known. Fix three real symmetric matrices A, B and C of size N x N. Then 

S = {{x,y) gR^ : xA + yB + C is positive semidefinite } 

is a closed, convex, semialgebraic subset of the plane M^. The problem is to find 
a good characterization of those subsets S. Given a semialgebraic subset 5 C ffi.^ 
which is closed and convex, how to decide whether a "semidefinite representation" 
exists, and, in the affirmative case, how to find matrices A, B, C of minimum size. 

7. Numerical Real Algebraic Geometry and The Positivstellensatz 

The first part of the above title refers to a paper by Sommese and Wampler 
(SW9§. This paper and other more recent ones suggest that numerical algorithms 
will play an increasingly important role in computational (complex) algebraic ge- 
ometry, and that polynomial systems will become much more visible in the context 
of ScientiRc Computation. Along the same lines, the fastest software for computing 
Grobner bases, due to Faugere [ Fau9g| ] , no longer uses the Buchberger algorithm 



but replaces it by sophisticated numerical linear algebra. Faugere's scheme 
Grobner Bases + Numerical Linear Algebra Polynomial Problems over 

has the potential of entering the standard repertoire of Scientific Computation. 



Following |ParOO] we propose an analogous scheme for the field of real numbers: 

Positivstellensatz + Semidefinite Programming Polynomial Problems over R. 

In what follows, we shall explain this relationship and why we see the Positivstellen- 
satz as the main catalyst for a future role of real algebra in scientific computation. 



The Positivstellensatz | BCR98 is a common generalization of Linear Pro- 
gramming Duality (for linear inequalities) and Hilbert's NuUstellensatz (for an al- 
gebraically closed field). It states that, for a system of polynomial equations and 
inequalities, either there exists a solution in M", or there exists a certain polyno- 
mial identity which bears witness to the fact that no solution exists. For instance, 
a single polynomial inequality f{x) < either has a solution x e R", or there 
exists an identity g{x)f{x) = h{x) where g and h are sums of squares. See [ BS89| 



for an exposition of the Positivstellensatz from the perspective of computational 



geometry. Finding a witness by linear programming is proposed in BS89, §7.3] 
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Here is our punchline, first stated in the dissertation of the first author ParOO 
A Positivstellensatz witness of bounded degree can be computed by semidcRnite 
programming. Here we can also optimize linear parameters in the coefficients. This 
suggests the following algorithm for deciding a system of polynomial equations and 
inequalities: decide whether there exists a witness for infeasibility of degree < D, 
for some Z) ^ 0. If our system is feasible, then we might like to minimize a 
polynomial f{x) over the solution set. The D-th SDP relaxation would be to ask 
for the largest real number A such that the given system together with the inequality 
/(x) — A < has an infeasibility witness of degree D. This generalizes what was 
proposed in Sections 6.2, 6.3 and 6.4. 

It is possible, at least in principle, to use an a priori bound for the degree D 
in the Positivestellensatz, however, the currently known bounds are still very large. 
Lombardi and Roy recently announced a bound which is triply-exponential in the 
number n of variables. We hope that such bounds can be further improved, at least 
for some natural families of polynomial problems arising in optimization. 

Here is a very simple example in the plane to illustrate our method: 



(7.1) 



/ 



y 



3 > 0, 



y 



0. 



By the Positivstellensatz, the system {/ > 0, <? = 0} has no solution {x, y) G 
if and only if there exist polynomials si, S2, S3 G M[a;, y] that satisfy the following: 

(7.2) si + §2 • / + 1 + S3 • g = , where si and S2 are sums of squares. 

The D-th SDP relaxation of the polynomial problem {/ > 0, g = 0} asks whether 
there exists a solution (si, S2, S3) to (7.2) where the polynomial si has degree < D 
and the polynomials S2, S3 have degree < D — 2. For each fixed integer D > this 
can be tested by semidefinite programming. For D = 2 we find the solution 



si = i + 2(y+|) +6(x-i) , S2 = 2, S3 = -6. 
The resulting identity (|7.2|) proves the inconsistency of the system {/ > 0, 5 = 0}. 
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